809 research outputs found

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    Fast Hands-free Writing by Gaze Direction

    Full text link
    We describe a method for text entry based on inverse arithmetic coding that relies on gaze direction and which is faster and more accurate than using an on-screen keyboard. These benefits are derived from two innovations: the writing task is matched to the capabilities of the eye, and a language model is used to make predictable words and phrases easier to write.Comment: 3 pages. Final versio

    Investigating five key predictive text entry with combined distance and keystroke modelling

    Get PDF
    This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users

    Towards an automated classification of spreadsheets

    Get PDF
    Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work

    Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification

    Get PDF
    In this paper, we develop a novel Aligned-Spatial Graph Convolutional Network (ASGCN) model to learn effective features for graph classification. Our idea is to transform arbitrary-sized graphs into fixed-sized aligned grid structures, and define a new spatial graph convolution operation associated with the grid structures. We show that the proposed ASGCN model not only reduces the problems of information loss and imprecise information representation arising in existing spatially-based Graph Convolutional Network (GCN) models, but also bridges the theoretical gap between traditional Convolutional Neural Network (CNN) models and spatially-based GCN models. Moreover, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the process of spatial graph convolution, explaining the effectiveness of the proposed model. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model

    Identifying Critical States by the Action-Based Variance of Expected Return

    Full text link
    The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

    Performing Feature Selection with ACO

    Get PDF
    Summary. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. However, current methods are inadequate at finding optimal reductions. This chapter presents a feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to two very different challenging tasks, namely web classification and complex systems monitoring.

    Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere

    Get PDF
    We calculate the high temperature partition functions for SU(Nc) or U(Nc) gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors, and/or fermions in an arbitrary representation, at zero 't Hooft coupling and large Nc, using analytical methods. We compare these with numerical results which are also valid in the low temperature limit and show that the Bekenstein entropy bound resulting from the partition functions for theories with any amount of massless scalar, fermionic, and/or vector matter is always satisfied when the zero-point contribution is included, while the theory is sufficiently far from a phase transition. We further consider the effect of adding massive scalar or fermionic matter and show that the Bekenstein bound is satisfied when the Casimir energy is regularized under the constraint that it vanishes in the large mass limit. These calculations can be generalized straightforwardly for the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio

    Investigating the effectiveness of client-side search/browse without a network connection

    Get PDF
    Search and browse, incorporating elements of information retrieval and database operations, are core services in most digital repository toolkits. These are often implemented using a server-side index, such as that produced by Apache SOLR. However, sometimes a small collection needs to be static and portable, or stored client-side. It is proposed that, in these instances, browser-based search and browse is possible, using standard facilities within the browser. This was implemented and evaluated for varying behaviours and collection sizes. The results show that it was possible to achieve fast performance for typical queries on small- to medium-sized collections

    Using deep learning for ordinal classification of mobile marketing user conversion

    Get PDF
    In this paper, we explore Deep Multilayer Perceptrons (MLP) to perform an ordinal classification of mobile marketing conversion rate (CVR), allowing to measure the value of product sales when an user clicks an ad. As a case study, we consider big data provided by a global mobile marketing company. Several experiments were held, considering a rolling window validation, different datasets, learning methods and performance measures. Overall, competitive results were achieved by an online deep learning model, which is capable of producing real-time predictions.This article is a result of the project NORTE-01-0247-FEDER-017497, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by Funda¸c˜ao para a Ciˆencia e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/201
    corecore